Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Address extraction using Hidden Markov Models

Identifieur interne : 001419 ( Main/Exploration ); précédent : 001418; suivant : 001420

Address extraction using Hidden Markov Models

Auteurs : Kazem Taghva [États-Unis] ; Jeffrey Coombs [États-Unis] ; Ray Pereda [États-Unis] ; Thomas Nartker [États-Unis]

Source :

RBID : Pascal:05-0359645

Descripteurs français

English descriptors

Abstract

This paper presents the implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text. Although Hidden Markov Models discover addresses with high precision and recall, this type of Information Extraction task seems to be affected negatively by the presence of OCR text.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Address extraction using Hidden Markov Models</title>
<author>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Pereda, Ray" sort="Pereda, Ray" uniqKey="Pereda R" first="Ray" last="Pereda">Ray Pereda</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Nartker, Thomas" sort="Nartker, Thomas" uniqKey="Nartker T" first="Thomas" last="Nartker">Thomas Nartker</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">05-0359645</idno>
<date when="2005">2005</date>
<idno type="stanalyst">PASCAL 05-0359645 INIST</idno>
<idno type="RBID">Pascal:05-0359645</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000463</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000325</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000435</idno>
<idno type="wicri:doubleKey">1017-2653:2005:Taghva K:address:extraction:using</idno>
<idno type="wicri:Area/Main/Merge">001465</idno>
<idno type="wicri:Area/Main/Curation">001419</idno>
<idno type="wicri:Area/Main/Exploration">001419</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Address extraction using Hidden Markov Models</title>
<author>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Pereda, Ray" sort="Pereda, Ray" uniqKey="Pereda R" first="Ray" last="Pereda">Ray Pereda</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Nartker, Thomas" sort="Nartker, Thomas" uniqKey="Nartker T" first="Thomas" last="Nartker">Thomas Nartker</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint>
<date when="2005">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Hidden Markov models</term>
<term>High precision</term>
<term>Implementation</term>
<term>Information extraction</term>
<term>Optical character recognition</term>
<term>Probabilistic approach</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Modèle Markov variable cachée</term>
<term>Implémentation</term>
<term>Reconnaissance optique caractère</term>
<term>Précision élevée</term>
<term>Extraction information</term>
<term>Approche probabiliste</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper presents the implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text. Although Hidden Markov Models discover addresses with high precision and recall, this type of Information Extraction task seems to be affected negatively by the presence of OCR text.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Nevada</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Nevada">
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
</region>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
<name sortKey="Nartker, Thomas" sort="Nartker, Thomas" uniqKey="Nartker T" first="Thomas" last="Nartker">Thomas Nartker</name>
<name sortKey="Pereda, Ray" sort="Pereda, Ray" uniqKey="Pereda R" first="Ray" last="Pereda">Ray Pereda</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001419 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001419 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:05-0359645
   |texte=   Address extraction using Hidden Markov Models
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024